feat(install): slim install for remote/NIM-only inference on Intel Mac by charlesbluca · Pull Request #1830 · NVIDIA/NeMo-Retriever

charlesbluca · 2026-04-09T15:15:12Z

Description

This branch streamlines the default NeMo Retriever install and dependency story, and tightens optional vs core packages in nemo_retriever/pyproject.toml.

Dependency and install model

Core dependencies focus on orchestration (Ray), ingest API/client, HTTP stack, LanceDB, PDF/image/NIM-related libs, and shared utilities. nv-ingest is no longer a direct dependency of nemo-retriever; editable installs still wire nv-ingest-api and nv-ingest-client from the monorepo paths.
New optional groups: [local] (GPU-oriented torch, transformers, Nemotron model packages, vLLM on Linux, etc.), [multimedia] (audio/ASR + SVG via cairosvg), [stores] (DuckDB and Neo4j), [benchmarks] (datasets / open-clip), and [all] as a convenience meta-extra.
Torch / torchvision resolution: CUDA 13 wheel index on Linux and Windows; macOS falls back to PyPI CPU wheels. The renamed UV index (pytorch-cu130) matches that layout.
uv.lock is regenerated to match the slimmer dependency graph.

Docs

nemo_retriever/README.md: clearer split between remote NIM-only vs local GPU setups; documents [local] and the CUDA 13 torch override; SVG install now points at [multimedia] (with a direct cairosvg alternative).

Runtime and API behavior

Lazy / guarded imports and small compatibility fixes (e.g. Ray 2.49) across ingest API utilities and retriever modules so optional stacks fail more gracefully when extras are not installed.
Text embeddings: CPU embedding path supports remote HTTP when an endpoint URL is set, otherwise local Hugging Face on CPU; shared URL normalization avoids doubling /embeddings on HTTP embedding URLs (delegates to nv_ingest_api helper). Tests aligned with the local-HF default when no remote URL is configured.

CI

Unit tests install nemo_retriever[all,dev] instead of separate editable src/api/client plus nemo_retriever.
New scheduled / dispatch workflow integration-test-library-mode.yml: library-mode graph pipeline on Windows and macOS (ARM and Intel), using NVCF_API_KEY for remote NIM endpoints.

Misc

Removes obsolete agent documentation; assorted small fixes (e.g. Milvus client usage, executor/ocr/pdf paths) driven by the slimmer dependency surface.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

…d macOS Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

macOS CI runners (Apple Silicon) can be slow to start the Ray raylet, causing the node to time out during startup. Setting the wait to 120s prevents premature failures on resource-constrained runners. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Guard cv2, sklearn, langdetect, minio, pymilvus, and unstructured_client behind try/except or deferred imports so nemo_retriever and nv_ingest_api can be imported without GPU/local-inference packages installed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- pdfium.py: numpy channel-swap fallback when cv2 is None - extract.py: PIL encode fallback for page image rendering without cv2 - transforms.py: PIL fallbacks for numpy_to_base64, base64_to_numpy - table_and_chart.py (api + nemo_retriever): guard bare sklearn DBSCAN imports with ImportError fallback using naive y-grid row grouping - ocr/shared.py: same guard for _blocks_to_pseudo_markdown DBSCAN - lancedb_store.py: skip index creation when no embedding rows written - text_embed/runtime.py: raise on embedding failure instead of silent no-op - graph/executor.py: per-stage debug logging for inprocess pipeline - text_embed/cpu_operator.py: debug logging for embed input/output stats Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Windows also supports CUDA torch wheels; extend the platform marker to sys_platform == 'linux' or sys_platform == 'win32' so Windows users get the GPU-enabled wheel from the PyTorch CUDA index rather than falling through to the CPU-only PyPI wheel. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

greptile-apps · 2026-04-09T15:20:13Z

Greptile Summary

This PR streamlines the nemo_retriever install surface by splitting heavy GPU/ML deps into optional groups ([local], [multimedia], [stores], [benchmarks]), adds PIL fallbacks for every cv2-dependent code path so that the slim [remote] install works on Intel Mac, normalizes the embedding URL construction via a shared ensure_openai_embeddings_http_url helper, and fixes several usability issues (Ray 2.49 concurrency requirement, docstring placement, empty-file guard in text splitting). All previously identified P0/P1 concerns from prior review threads have been addressed.

Confidence Score: 5/5

Safe to merge; all previously identified P0/P1 concerns are addressed and only P2 style/edge-case findings remain.

The milvus.py None sentinel fix, _resize_image_opencv cv2 crash fix, and graceful LanceDB core-dep handling resolve the main prior concerns. New findings are a PIL RGBA compositing mismatch, a dead except ImportError, inline logging import, and a missing empty-content guard in txt_bytes_to_chunks_df — all P2 and none blocking.

api/src/nv_ingest_api/util/image_processing/transforms.py (PIL RGBA fallback), nemo_retriever/src/nemo_retriever/txt/split.py (empty-bytes guard)

Important Files Changed

Filename	Overview
api/src/nv_ingest_api/util/image_processing/transforms.py	Adds PIL fallbacks for every cv2 code path; minor RGBA compositing mismatch and a vestigial `except ImportError` remain.
nemo_retriever/src/nemo_retriever/text_embed/runtime.py	Converts `print`/`BaseException` error handler to proper `logger.error`/`Exception` with graceful DataFrame return; correct and intentional behavior change.
nemo_retriever/src/nemo_retriever/text_embed/cpu_operator.py	Adds local HF embedder path when no remote endpoint is configured; logic and lazy import pattern are correct.
nemo_retriever/src/nemo_retriever/graph/executor.py	Adds Ray 2.49 `concurrency=1` compatibility fix and per-stage debug logging; logging import placed inside method body rather than at module level.
nemo_retriever/pyproject.toml	Moves GPU/ML deps to `[local]`, audio/SVG to `[multimedia]`, SQL/graph stores to `[stores]`, benchmarks to `[benchmarks]`; `lancedb` stays in core; torch index renamed to `pytorch-cu130` with correct platform markers.
api/src/nv_ingest_api/util/string_processing/init.py	New `ensure_openai_embeddings_http_url` helper is idempotent and correctly prevents `/embeddings/embeddings` doubling.
.github/workflows/integration-test-library-mode.yml	New nightly integration test for Windows/macOS ARM/Intel; third-party actions still on mutable version tags (previously noted), and no `permissions:` block is declared.
nemo_retriever/src/nemo_retriever/vector_store/lancedb_store.py	Removes redundant try/except around `import lancedb` inside `_write_rows_to_lancedb`; adds empty-rows guard in `handle_lancedb`; `import lancedb` at module level is appropriate since lancedb is a core dependency.
nemo_retriever/src/nemo_retriever/txt/split.py	Fixes docstring placement and adds empty-content early-return in `txt_file_to_chunks_df`; `txt_bytes_to_chunks_df` is missing the same guard.

Sequence Diagram

sequenceDiagram
    participant GP as graph_pipeline
    participant CPU as _BatchEmbedCPUActor
    participant RT as embed_text_main_text_embed
    participant HF as create_local_embedder
    participant NIM as Remote NIM /embeddings

    GP->>CPU: __init__(EmbedParams)
    alt embed_invoke_url set
        CPU->>CPU: self._model = None
        Note over CPU: Remote path
    else no endpoint
        CPU->>HF: create_local_embedder(model_name, device=cpu)
        HF-->>CPU: local HF model
        Note over CPU: Local CPU path
    end

    GP->>CPU: process(df)
    CPU->>RT: embed_text_main_text_embed(df, model=self._model, ...)
    alt model is not None (local)
        RT->>HF: model.embed(texts)
        HF-->>RT: vectors
    else endpoint is set (remote)
        RT->>NIM: POST /embeddings (ensure_openai_embeddings_http_url)
        NIM-->>RT: embeddings JSON
    end
    alt success
        RT-->>CPU: out_df (has_embedding=True)
    else Exception
        RT->>RT: logger.error(..., exc_info=True)
        RT-->>CPU: out_df (has_embedding=False, error=str(exc))
    end
    CPU-->>GP: out_df

Prompt To Fix All With AI

This is a comment left during a code review.
Path: api/src/nv_ingest_api/util/image_processing/transforms.py
Line: 697-718

Comment:
**PIL RGBA fallback composites on black, cv2 path uses white**

When cv2 is unavailable, `Image.open(BytesIO(image_bytes)).convert("RGB")` composites alpha onto a black background (PIL's default), while the cv2 path calls `rgba_to_rgb_white_bg` which blends against white. For document images (which are typically white-background), this produces visually different results in the cv2-free path.

```suggestion
        else:
            from io import BytesIO

            pil_img = Image.open(BytesIO(image_bytes))
            if pil_img.mode == "RGBA":
                background = Image.new("RGB", pil_img.size, (255, 255, 255))
                background.paste(pil_img, mask=pil_img.split()[3])
                pil_img = background
            else:
                pil_img = pil_img.convert("RGB")
            img = np.array(pil_img)
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: api/src/nv_ingest_api/util/image_processing/transforms.py
Line: 714-715

Comment:
**Vestigial `except ImportError: raise`**

After the refactoring the only potential `ImportError` source inside this `try` block is `from io import BytesIO`, which is a stdlib module that cannot fail. The clause is dead code and may confuse future readers.

```suggestion
    except Exception as e:
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/executor.py
Line: 131-133

Comment:
**Module-level logger preferred over inline import**

`import logging as _logging` and the `_exec_log = ...` call are placed inside `execute()` on every invocation. The standard pattern is a single module-level `logger = logging.getLogger(__name__)`. If the goal was to avoid conflicts with an existing name, the existing module-level logger can simply be reused.

Move `import logging` and `logger = logging.getLogger(__name__)` to module scope alongside the existing imports, then replace `_exec_log` references with `logger`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/txt/split.py
Line: 217-220

Comment:
**`txt_bytes_to_chunks_df` missing empty-content guard**

`txt_file_to_chunks_df` received an early return for empty/whitespace-only content (returning a correctly-typed empty DataFrame). `txt_bytes_to_chunks_df` did not get the same treatment. With empty bytes, `raw` will be `""`, `split_text_by_tokens` may return `[]`, and `pd.DataFrame([])` produces a 0-column DataFrame — downstream callers that access `df["text"]` would raise `KeyError`.

```python
raw = content_bytes.decode(encoding, errors="replace")
if not raw or not raw.strip():
    return pd.DataFrame(
        columns=["text", "path", "page_number", "metadata"],
    ).astype({"page_number": "int64"})
model_id = tokenizer_model_id or DEFAULT_TOKENIZER_MODEL_ID
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: .github/workflows/integration-test-library-mode.yml
Line: 1-10

Comment:
**Missing `permissions:` block**

Per the `github-actions-security` rule, new workflows should declare an explicit least-privilege `permissions:` block. Without it the job inherits the repository's default token permissions. A read-only scope is sufficient for this checkout-and-test workflow:

```yaml
permissions:
  contents: read
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (5): Last reviewed commit: "test(embed): align BatchEmbedCPUActor te..." | Re-trigger Greptile}

client/src/nv_ingest_client/util/vdb/milvus.py

api/src/nv_ingest_api/util/image_processing/transforms.py

nemo_retriever/src/nemo_retriever/text_embed/runtime.py

greptile-apps · 2026-04-09T15:20:23Z

.github/workflows/integration-test-library-mode.yml

+
+      - name: Set up Python 3.12
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.12'
+
+      - name: Install uv
+        run: pip install uv


GitHub Actions not pinned to commit SHA

Both actions/checkout@v4 and actions/setup-python@v5 use mutable version tags. Per the repository's github-actions-security rule, third-party actions must be pinned to a full commit SHA to prevent supply-chain attacks.

Suggested change

- name: Set up Python 3.12

uses: actions/setup-python@v5

with:

python-version: '3.12'

- name: Install uv

run: pip install uv

- name: Check out repository code

uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

with:

ref: ${{ inputs.source-ref != '' && inputs.source-ref || github.ref }}

- name: Set up Python 3.12

uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0

with:

python-version: '3.12'

Prompt To Fix With AI

This is a comment left during a code review. Path: .github/workflows/integration-test-library-mode.yml Line: 47-54 Comment: **GitHub Actions not pinned to commit SHA** Both `actions/checkout@v4` and `actions/setup-python@v5` use mutable version tags. Per the repository's `github-actions-security` rule, third-party actions must be pinned to a full commit SHA to prevent supply-chain attacks. ```suggestion - name: Check out repository code uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 with: ref: ${{ inputs.source-ref != '' && inputs.source-ref || github.ref }} - name: Set up Python 3.12 uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0 with: python-version: '3.12' ``` How can I resolve this? If you propose a fix, please make it concise.

.github/workflows/integration-test-library-mode.yml

- Wrap bare `import lancedb` in try/except in lancedb_store.py; add None guard in handle_lancedb for [remote]-only installs - Make graph_pipeline.py import of handle_lancedb lazy to avoid startup crash when lancedb is not installed - Add None sentinels for all optional pymilvus/minio/scipy imports in milvus.py (previously only CONSISTENCY_BOUNDED was set) - Guard cv2.INTER_LANCZOS4 and cv2.resize in _resize_image_opencv with PIL fallback when cv2 is unavailable - Restore graceful degradation in embed_text_main_text_embed: return error-payload DataFrame instead of re-raising on embedding failure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…izer - Restore handle_lancedb as module-level import in graph_pipeline.py so monkeypatch.setattr works in batch_pipeline tests (safe now that lancedb_store.py guards its own import lancedb) - Early-return in txt_file_to_chunks_df before calling tokenizer when file content is empty, matching the pattern in html/convert.py - Mock _get_tokenizer/_get_txt_tokenizer in txt and html unit tests to avoid hitting HuggingFace (proxy-blocked in this environment) - Skip audio benchmark test gracefully when Ray cannot write to the filesystem (EROFS) rather than failing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Move docstrings in txt_file_to_chunks_df and txt_bytes_to_chunks_df to immediately follow the def line so they are recognized as actual docstrings (previously appeared after variable assignments) - Remove conflicts block in pyproject.toml that referenced undefined cpu/cu130 extras; torch wheel selection is already handled by sys_platform markers in [tool.uv.sources] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Merge remote extra into base dependencies (document parsing and NIM client libs are now core; no extra needed for remote-only installs) - Move lancedb into base dependencies (it is the default VDB solution) - Merge local-cpu and local-gpu into a single local extra (GPU assumed) - Drop lancedb from [stores]; update [all] to reference [local] - Remove now-unnecessary lancedb import guards in lancedb_store.py - Update integration-test workflow to install bare nemo_retriever - Update DEPENDENCY_LAYERS.md to reflect new structure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Replace core-only install with [local] extra for local GPU path; add separate remote-only install command for NIM inference users - Reframe Step 2 torch override: clarifies it is a PyPI/CUDA workaround, not a completion of the GPU stack; mark it skip-able for remote users - Add note before pipeline examples that they require [local] + CUDA torch - Fix [svg] extra reference to [multimedia] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…workflow Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Made-with: Cursor # Conflicts: # nemo_retriever/pyproject.toml

- Add ensure_openai_embeddings_http_url in nv-ingest string_processing - Use it in infer_microservice (api + client) and retriever main_text_embed - CPU embed actor: local HF when no remote endpoint instead of default integrate URL Made-with: Cursor

After the embed URL fix, the CPU actor uses a local embedder when no HTTP endpoint is set. Update the unit test to assert that behavior and mock create_local_embedder so the suite does not require HF revisions. Made-with: Cursor

charlesbluca and others added 16 commits April 8, 2026 17:56

Add workflow_dispatch integration test for library mode on Windows an…

0ef1eae

…d macOS Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Split torch cu130 deps into explicit group

6a21df3

Use inprocess run mode for now

027c154

Pass API key via --api-key flag using NGC_NV_DEVELOPER_NVCF secret

79e8f0b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Initial plan & refactors for slimmer instal

1f31946

Drop nv-ingest as dep

8c45923

Merge remote-tracking branch 'upstream/main' into slim-install

264ab24

Merge branch 'slim-install'

5bef6f7

Add macOS x64 to workflow

26475c8

torch cuda index rename

4b120f0

Merge branch 'slim-install'

9f2035b

Try switching to macos-26-intel

45f2732

charlesbluca requested review from a team as code owners April 9, 2026 15:15

charlesbluca requested a review from nkmcalli April 9, 2026 15:15

greptile-apps bot reviewed Apr 9, 2026

View reviewed changes

Modify unit test install

9e01344

charlesbluca marked this pull request as draft April 9, 2026 15:33

Linting

326de9a

greptile-apps bot reviewed Apr 9, 2026

View reviewed changes

.github/workflows/integration-test-library-mode.yml Show resolved Hide resolved

charlesbluca and others added 5 commits April 9, 2026 15:53

Drop agent doc

48a8953

charlesbluca and others added 7 commits April 9, 2026 16:52

ci: add nightly schedule trigger and fix secret name in library mode …

ec99f30

…workflow Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Compat code for ray[data] 2.49

92f777a

Merge upstream/main into slim-install

eabdf74

Merge remote-tracking branch 'upstream/main' into slim-install

1085a2c

Made-with: Cursor # Conflicts: # nemo_retriever/pyproject.toml

charlesbluca marked this pull request as ready for review April 13, 2026 16:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(install): slim install for remote/NIM-only inference on Intel Mac#1830

feat(install): slim install for remote/NIM-only inference on Intel Mac#1830
charlesbluca wants to merge 30 commits intoNVIDIA:mainfrom
charlesbluca:slim-install

charlesbluca commented Apr 9, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 9, 2026 •

edited

Loading

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

charlesbluca commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

greptile-apps bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

charlesbluca commented Apr 9, 2026 •

edited

Loading

greptile-apps bot commented Apr 9, 2026 •

edited

Loading